<html>
<head>
<title>Sparse Row Store</title>
</head>
<body>

<p>

This package provides support for treating normal B+Trees using a
"sparse row store" pattern and can be applied to both local B+Trees
and scale-out indices.  A sparse row store is a data model in which
the B+Tree keys are formed as:

</p>

<pre>

[schemaName][primaryKey][columnName][timestamp]

</pre>

<p>

and the values are the value for a given column for that primary key.
People familar with the Google "bigtable" project will recognize this
data model.

</p>

<p>

Sparse row stores are flexible, scaleable and highly efficient data
structures.  A single logical "row" is made up of a key range spanning
all entries within a given schema and having the same primary key -
when multiple schemas are used, the facet of the row for each schema
must be read explicitly, but this operation can occur in parallel.
They allow applications to cluster data by schema and by primary key,
and make it easy to store only the non-null values in a given "row".
Likewise, on retrieval the application can ask for all columns for a
given row or only those satisifying some constraint.

</p>

<p>

Timestamps are either generated by the application, in which case they
define the semantics of a write-write conflict, or on write by the
index.  In the latter case, write-write conflicts never arise.
Regardless of how timestamps are generated, the use of the timestamp
in the <em>key</em> requires that applications specify filters that
are applied during row scans to limit the data points actually
returned as part of the row.  For example, only returning the most
recent column values no later than a given timestamp for all columns
for some primary key.

</p>

<p>

For example, assuming records with the following columns

<ul>
<li>Id</li>
<li>Name</li>
<li>Employer</li>
<li>DateOfHire</li>
</ul>

would be represented as a series of index entries as follows:

</p>

<pre>

[employee][12][DateOfHire][t0] : [4/30/02]
[employee][12][DateOfHire][t1] : [4/30/05]
[employee][12][Employer][t0]   : [SAIC]
[employee][12][Employer][t1]   : [SYSTAP]
[employee][12][Id][t0]         : [12]
[employee][12][Name][t0]       : [Bryan Thompson]

</pre>

<p>

The records have been grouped under a schema named "employee".  All
records have the same primary key [12], which is also the value of the
<code>Id</code> column.  The entries with timestamp <code>t0</code>
represent the 1st logical row and specify the date of hire at SAIC.
The entries with timestamp <code>t1</code> are the <em>modified</em>
column values (unmodified column values are not rewritten)
representing the date of hire at SYSTAP - the Id and Name column
values from <code>t1</code> are unchanged.  Note that all entries
share a common prefix which would be compressed down when it is stored
in the indices.

</p>

<p>

There are two logical rows in the example above

<ol>

<li>Name = Bryan Thompson, Id = 12, DateOfHire = 4/30/02, Employer =
SAIC</li>

<li>Name = Bryan Thompson, Id = 12, DateOfHire = 4/30/05, Employer =
SYSTAP</li> </ol>

 A query for all columns with the primary key "12" and the "employee"
scheme would result in a row scan whose starting key was
<code>[employee][12]</code>.  If you want only the most current row as
of a given timestamp, then you would specify a filter for the row scan
that only accepted the most current entries for any column value
having a timestamp not greater than the given timestamp for any column
name.

</p>

<p>

The sparse row store is capable of storing historical revisions
efficiently, but if too many revisions are stored for a given primary
key then retrieval performance for logical rows for that key will
eventually degrade.  In order to maintain performance for rows that
will be updated frequently, a history policy should be periodically
applied to expunge old data from the indices.  Typical policies will
either keep only a limited number of revisions of a row, e.g., the
last 3 revisions, or all revisions within a certain number of hours or
days.  Since only modified column values are stored, care must be
taken to expunge only column values that (a) fail the history policy;
and (b) have been overwritten.  When using partitioned indices, the
history policy is applied during compaction (when the segments that
comprise a partition are merged).

</p>

<p>

The sparse row store provides a guarentee of atomic read and writes at
the logical "row" level.  A logical row corresponds to the (ordered)
set of entries in an index having a common schema and primary
key. This guarentee of atomic row operations is achieved together with
very high concurrent read and write rates by imposing the following
constraints:

<ol>

<li> The client uses a unisolated reads and writes to communicate with
the data services.</li>

<li> The client never splits read or write operations across a
[schema][primaryKey] prefix.  More typically, the client restricts
read and write operations to a single logical row at a time.  Together
with the use of unisolated operations this guarentees row level
atomicity of read and write operations on a given index
partition.</li>

<li> When using a partitioned index, the data services never split
across a [schema][primaryKey] prefix.  In order to enforce this
constraint (and in order to support auto-timestamps), a partitioned
index must be provisioned specifically for a sparse row store. 
Likewise, per-schema provisioning may also be used to give guidence
on the latency or redundency requirements for a given schema.</li>

</ol>

</p>

<p>

@todo work through provisioning and security models.

</p>

</body>
</html>
